PyDigger - unearthing stuff about Python


NameVersionSummarydate
superoptix 0.1.0b8 Full Stack Agentic AI Framework 2025-08-02 15:48:45
zeroeval 0.6.8 ZeroEval SDK 2025-08-02 06:18:53
openjury 0.1.0 Python SDK for evaluating multiple model outputs using configurable LLM-based jurors 2025-08-01 19:36:43
python-flexeval 0.1.5 FlexEval is a tool for designing custom metrics, completion functions, and LLM-graded rubrics for evaluating the behavior of LLM-powered systems. 2025-08-01 01:20:35
llama-index-packs-rag-evaluator 0.4.0 llama-index packs rag_evaluator integration 2025-07-30 20:54:25
dyff-audit 0.11.1 Audit tools for the Dyff AI auditing platform. 2025-07-30 17:35:43
agenta 0.50.3 The SDK for agenta is an open-source LLMOps platform. 2025-07-29 17:42:14
quotientai 0.4.6 Python library for tracing, logging, and detecting problems with AI Agents 2025-07-29 14:28:52
trajectopy 3.1.2 Trajectory Evaluation in Python 2025-07-29 12:42:26
dyff-client 0.18.0 Python client for the Dyff AI auditing platform. 2025-07-28 18:51:39
pymcpevals 0.1.1 Python package for evaluating MCP (Model Context Protocol) server implementations using LLM-based scoring 2025-07-27 07:17:20
mandoline 0.4.0 Official Python client for the Mandoline API 2025-07-26 20:32:40
SurvivalEVAL 0.4.5 The most comprehensive Python package for evaluating survival analysis models. 2025-07-26 06:19:12
dyff-schema 0.30.1 Data models for the Dyff AI auditing platform. 2025-07-25 17:35:17
evalassist 0.1.20 EvalAssist is an open-source project that simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other large language models by supporting users in iteratively refining evaluation criteria in a web-based user experience. 2025-07-25 16:44:14
monitoring-rag 0.0.2 A comprehensive, framework-agnostic library for evaluating Retrieval-Augmented Generation (RAG) pipelines. 2025-07-24 11:25:53
novaeval 0.4.0 A comprehensive, open-source LLM evaluation framework for testing and benchmarking AI models 2025-07-22 19:20:41
evalscope 0.17.1 EvalScope: Lightweight LLMs Evaluation Framework 2025-07-21 02:12:56
grandjury 1.0.1 Python client for GrandJury server API - collective intelligence for model evaluation 2025-07-18 05:08:40
rag-evaluation 0.2.2 A robust Python package for evaluating Retrieval-Augmented Generation (RAG) systems. 2025-07-17 08:30:01
hourdayweektotal
77163710458305786
Elapsed time: 3.21333s